Skip to content

fix(responses): split durable previous_response_id continuity from #211#270

Closed
yigitkonur wants to merge 1 commit intoSoju06:mainfrom
yigitkonur:feat/sqlite-durable-response-continuity
Closed

fix(responses): split durable previous_response_id continuity from #211#270
yigitkonur wants to merge 1 commit intoSoju06:mainfrom
yigitkonur:feat/sqlite-durable-response-continuity

Conversation

@yigitkonur
Copy link
Copy Markdown
Contributor

Summary

  • persist durable response snapshots keyed by response_id and replay previous_response_id across HTTP and WebSocket Responses traffic
  • keep SQLite as the default backend; this PR does not require PostgreSQL/Neon and does not change the backend contract
  • scope snapshots by api_key_id, prefer the originating account when replaying, and retry one websocket request after an early upstream disconnect before response.created

Notes

Testing

  • rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_migrations.py -q
  • rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_load_balancer_integration.py -q
  • rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_openai_compat_features.py -q
  • rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_proxy_websocket_responses.py -q -k "replays_previous_response_after_restart or previous_response_id_is_scoped_to_api_key or retries_once_after_upstream_eof_before_response_created"
  • rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_http_responses_bridge.py -q -k "replays_previous_response_after_bridge_loss or replays_previous_response_without_opening_fresh_session or send_failure_replays_previous_response_from_snapshot or precreated_disconnect_replays_previous_response_from_snapshot"
  • rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/python -m compileall app tests
  • rtk proxy openspec validate --specs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces SQLite-backed durable continuity for Responses by persisting completed response “snapshots” and using them to replay previous_response_id across HTTP and WebSocket flows (including restart/bridge-loss scenarios), while also adding a narrow WebSocket retry for early upstream disconnects.

Changes:

  • Add durable response_snapshots storage (model + migration + repository) and persist completed Responses into it.
  • Resolve previous_response_id by replaying caller-scoped snapshot chains and prefer the originating account when selecting an upstream account.
  • Retry one in-flight WebSocket request when upstream disconnects before response.created, and update integration/spec coverage accordingly.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
app/modules/proxy/service.py Implements snapshot replay for previous_response_id, snapshot persistence on completion, and early-disconnect WebSocket retry + preferred-account routing.
app/modules/proxy/response_snapshots_repository.py Adds repository for snapshot get/upsert across SQLite/PostgreSQL dialects.
app/modules/proxy/repo_bundle.py Wires response_snapshots into the proxy repository bundle.
app/modules/proxy/load_balancer.py Adds preferred_account_id handling and supports excluding accounts (used by retry + replay preference).
app/dependencies.py Provides ResponseSnapshotsRepository in the default repo factory.
app/db/models.py Adds ResponseSnapshot ORM model and continuity index.
app/db/alembic/versions/20260327_000000_add_response_snapshots.py Migration creating/repairing response_snapshots table and index.
tests/integration/test_openai_compat_features.py Adds restart + API-key scoping coverage for HTTP /v1/responses replay behavior.
tests/integration/test_proxy_websocket_responses.py Adds WebSocket replay + scoping tests and early-disconnect retry test.
tests/integration/test_http_responses_bridge.py Updates HTTP bridge behavior to replay from snapshots when continuity is lost.
tests/integration/test_migrations.py Verifies migrations create response_snapshots and repository scoping/created_at behavior.
tests/integration/test_load_balancer_integration.py Adds coverage for preferred-account selection and fallback.
app/core/openai/requests.py Exposes sanitize_input_items() for reuse in replay normalization.
openspec/... files Updates specs/change artifacts for durable snapshot replay + early-disconnect retry semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1109 to +1113
resolved_request = await self._resolve_previous_response_request(
responses_payload,
api_key_id=refreshed_api_key.id if refreshed_api_key else None,
)
responses_payload = resolved_request.payload
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_prepare_websocket_response_create_request() always calls _resolve_previous_response_request(), which strips previous_response_id and rebuilds input from persisted snapshots. This means WebSocket requests never “prefer live upstream continuity” (forwarding previous_response_id to upstream) even when a live upstream session exists, which conflicts with the PR description/spec updates. Consider only replaying from snapshots when live continuity is unavailable (or trying upstream first and falling back to snapshot replay on previous_response_not_found).

Copilot uses AI. Check for mistakes.
Comment on lines +3348 to +3352
resolved_request = await self._resolve_previous_response_request(
payload,
api_key_id=api_key.id if api_key else None,
)
payload = resolved_request.payload
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_stream_with_retry() now resolves any non-empty previous_response_id via persisted snapshots and raises invalid_request_error when no snapshot exists. This will break existing integration expectations that previous_response_id is forwarded unchanged when the HTTP bridge is disabled (e.g., tests in test_proxy_sticky_sessions.py and WebSocket forwarding tests). If the new contract is intentional, those tests should be updated/removed; otherwise, consider allowing pass-through to upstream when snapshots are missing but live continuity might still exist.

Suggested change
resolved_request = await self._resolve_previous_response_request(
payload,
api_key_id=api_key.id if api_key else None,
)
payload = resolved_request.payload
original_payload = payload
try:
resolved_request = await self._resolve_previous_response_request(
payload,
api_key_id=api_key.id if api_key else None,
)
payload = resolved_request.payload
except AppError:
# If resolving the previous_response_id via snapshots fails (e.g., no snapshot
# exists), fall back to the original payload so that previous_response_id is
# forwarded unchanged to the upstream, preserving existing integration behavior.
payload = original_payload

Copilot uses AI. Check for mistakes.
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Mar 30, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6355223279

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2477 to +2479
if retried is not None:
account, upstream = retried
account_id_value = account.id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Synchronize retried upstream with the main websocket loop

When a pre-response.created disconnect is retried, this code updates account/upstream only inside _relay_upstream_websocket_messages. The outer proxy_responses_websocket loop still retains the original disconnected socket, so a later client message on the same downstream websocket is sent to stale upstream state and can fail with stream_incomplete even though the retry succeeded. This breaks continued multi-turn websocket use after transient upstream EOFs; the retried connection needs to be propagated back to shared state used by the sender loop.

Useful? React with 👍 / 👎.

@yigitkonur yigitkonur closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants